Multivariate TS Models

1. Introduction

In this section, we delve into the application of ARIMAX, SARIMAX, and VAR modeling techniques across a range of time series datasets pertinent to our project’s scope. ARIMAX and SARIMAX models come into play when we observe a unidirectional influence from one variable onto another. Conversely, VAR models are more suited to scenarios where a set of variables exhibits mutual influences.

Initiating this phase of the project involved a comprehensive literature review to identify the models previously employed to analyze COVID vaccination-related variables. The study by Yundari (2022) leverages the Vector Autoregressive (VAR) model to simultaneously analyze the impact of vaccination numbers (first dose) on new and recovered COVID-19 cases, utilizing daily data from January 13 to December 30, 2021, in West Kalimantan Province. Ye (2021) underscores the critical nature of escalating COVID-19 Daily Vaccination Numbers to curtail the pandemic, noting that achieving vaccination targets remains an uphill task. This study probes into how political partisanship might influence Daily Vaccination Numbers, drawing comparisons between Democratic and Republican counties. Asch (2024) delves into the correlation between political leanings and the reporting of vaccine adverse events (AEs), revealing that a 10% uptick in Republican votes at the state level correlates with a 5% increase in the likelihood of reporting a COVID-19 vaccine AE.

References

  1. Yundari, Y., & Huda, N. M. Analysis of the Impact of Vaccination on Daily New and Recovered COVID-19 Cases Using the Vector Autoregressive (VAR) Model: A Case Study of West Kalimantan. BAREKENG: Jurnal Ilmu Matematika dan Terapan. https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/5266

  2. Ye, X. (2021). Exploring the Relationship Between Political Partisanship and COVID-19 Vaccination Rate. https://academic.oup.com/jpubhealth/article/45/1/91/6409075

  3. Asch, D.A., Luo, C., & Chen, Y. (2024). Reports of COVID-19 Vaccine Adverse Events in Predominantly Republican vs Democratic States. JAMA Network Open, 7(3), e244177. https://doi.org/10.1001/jamanetworkopen.2024.4177

2. Define Models

Armed with enhanced insights into the relevant variables, this section is dedicated to defining the specific ARIMAX/SARIMAX/VAR models to be utilized for analyzing changes in Daily Vaccination Numbers. Each model is tailored to examine different facets of the interplay between COVID-19 vaccination efforts and socio-economic as well as political dynamics.

2.1 Confirmed COVID-19 Cases as a Function of Daily Vaccination Number:

(ARIMAX)Confirmed COVID-19 Cases ~ Daily Vaccination Number

This model aims to capture the impact of Daily Vaccination Numbers on the trend of newly confirmed COVID-19 cases.

Code
# Read the dataset
vac_df <- read_csv("Datasets/us_state_vaccinations.csv")

# Select relevant columns
cols_show <- c('date', 'location', 'daily_vaccinations_per_million')
t <- vac_df[, cols_show]

# Group by 'date' and summarize columns, ignoring NA values
t1 <- t %>%
  group_by(date) %>%
  summarize(
    daily_vaccinations_per_million = sum(daily_vaccinations_per_million, na.rm = TRUE)
  )

# Convert date column to Date format
t1$date <- as.Date(t1$date)

# Newly confirmed cases
wide_data <- read_csv("Datasets/covid_confirmed_usafacts.csv")

# Define the key and value columns for pivoting
key_cols <- c("countyFIPS", "County Name", "State", "StateFIPS")
value_cols <- setdiff(names(wide_data), key_cols)

# Pivot the data from wide to long
long_data <- pivot_longer(
  wide_data,
  cols = value_cols,
  names_to = "date",
  values_to = "value"
)

# Group by 'State' and 'date', and calculate the sum of Confirmed Cases
con_case_df <- long_data %>%
  group_by(date) %>%
  summarize(value_sum = sum(value, na.rm = TRUE))

# Convert date column to Date format
con_case_df$date <- as.Date(con_case_df$date)

# Merge them together
cava_df <- merge(t1, con_case_df, by = "date", all = FALSE)
head(cava_df)
        date daily_vaccinations_per_million value_sum
1 2020-12-20                              0  17848765
2 2020-12-21                            174  18010051
3 2020-12-22                            384  18204174
4 2020-12-23                            454  18416088
5 2020-12-24                            575  18609135
6 2020-12-25                            648  18704673
Code
# Shape of the df
cat('The shape of this dataframe is', dim(cava_df))
The shape of this dataframe is 872 3

2.2 COVID-19 Death Cases as a Function of Daily Vaccination Number:

(ARIMAX)COVID-19 Death Cases ~ Daily Vaccination Number

Through this model, we explore how variations in Daily Vaccination Numbers influence the count of COVID-19-related fatalities.

Code
# Read the dataset
vac_df <- read_csv("Datasets/us_state_vaccinations.csv")

# Select relevant columns
cols_show <- c('date', 'location', 'daily_vaccinations_per_million')
t <- vac_df[, cols_show]

# Group by 'date' and summarize columns, ignoring NA values
t1 <- t %>%
  group_by(date) %>%
  summarize(
    daily_vaccinations_per_million = sum(daily_vaccinations_per_million, na.rm = TRUE)
  )

# Convert date column to Date format
t1$date <- as.Date(t1$date)


# Death cases
wide_data <- read_csv("Datasets/covid_deaths_usafacts.csv")

# Define the key and value columns for pivoting
key_cols <- c("countyFIPS", "County Name", "State", "StateFIPS")
value_cols <- setdiff(names(wide_data), key_cols)

# Pivot the data from wide to long
long_data <- pivot_longer(
  wide_data,
  cols = value_cols,
  names_to = "date",
  values_to = "value"
)

# Group by 'State' and 'date', and calculate the sum of Confirmed Cases
dead_case_df <- long_data %>%
  group_by(date) %>%
  summarize(value_sum = sum(value, na.rm = TRUE))

# Convert date column to Date format
dead_case_df$date <- as.Date(dead_case_df$date)

# Merge them together
deva_df <- merge(t1, dead_case_df, by = "date", all = FALSE)
head(deva_df)
        date daily_vaccinations_per_million value_sum
1 2020-12-20                              0    324336
2 2020-12-21                            174    326365
3 2020-12-22                            384    328950
4 2020-12-23                            454    332017
5 2020-12-24                            575    335328
6 2020-12-25                            648    336496
Code
# Shape of the df
cat('The shape of this dataframe is', dim(deva_df))
The shape of this dataframe is 872 3

2.3 Unemployment Rate as Influenced by Daily Vaccination Number and Confirmed Cases:

(ARIMAX)Unemployment Rate ~ Daily Vaccination Number + Confirmed COVID-19 Cases

This model investigates the effect of Daily Vaccination Numbers and COVID-19 case counts on the unemployment rate, providing insights into the pandemic’s broader economic implications.

Code
# Read the dataset
vac_df <- read_csv("Datasets/us_state_vaccinations.csv")

# Select relevant columns
cols_show <- c('date', 'location', 'daily_vaccinations_per_million')
t <- vac_df[, cols_show]

# Group by 'date' and summarize columns, ignoring NA values
t1 <- t %>%
  group_by(date) %>%
  summarize(
    daily_vaccinations_per_million = sum(daily_vaccinations_per_million, na.rm = TRUE)
  )

# Convert date column to Date format
t1$date <- as.Date(t1$date)

# Aggregate data to monthly level using mean for each column
t1 <- t1 %>%
  group_by(date = format(date, "%Y-%m")) %>%
  summarize(daily_vaccinations_per_million = mean(daily_vaccinations_per_million, na.rm = TRUE))

# Transform the date column type with specified format
t1$date <- as.Date(paste0(t1$date, "-01-01"))

# Newly confirmed cases
wide_data <- read_csv("Datasets/covid_confirmed_usafacts.csv")

# Define the key and value columns for pivoting
key_cols <- c("countyFIPS", "County Name", "State", "StateFIPS")
value_cols <- setdiff(names(wide_data), key_cols)

# Pivot the data from wide to long
long_data <- pivot_longer(
  wide_data,
  cols = value_cols,
  names_to = "date",
  values_to = "value"
)

# Group by 'State' and 'date', and calculate the sum of Confirmed Cases
con_case_df <- long_data %>%
  group_by(date) %>%
  summarize(value_sum = sum(value, na.rm = TRUE))

# Convert date column to Date format
con_case_df$date <- as.Date(con_case_df$date)

# Aggregate data to monthly level using mean for each column
con_case_df <- con_case_df %>%
  group_by(date = format(date, "%Y-%m")) %>%
  summarize(value_sum = mean(value_sum, na.rm = TRUE))

# Transform the date column type with specified format
con_case_df$date <- as.Date(paste0(con_case_df$date, "-01-01"))


unemp <- read_csv('Datasets/unemployment.csv')
key_cols <- c("Location")
value_cols <- setdiff(names(unemp), key_cols)
unemp1 <- pivot_longer(
  unemp,
  cols = value_cols,
  names_to = "Time",
  values_to = "Unemployment"
)

# Convert Time column to Date format
unemp1$date <- as.Date(paste0(unemp1$Time, "-01"))

# Convert Unemployment column to numeric (floating-point) format
unemp1$Unemployment <- as.numeric(unemp1$Unemployment)

# Focus on US
unemp2 <- unemp1[unemp1$Location =='United States',]
unemp3 <- unemp2[,c('date', 'Unemployment')]

# Merge them together
dd_df <- merge(t1, con_case_df, by = "date", all = FALSE)
ddd_df <- merge(dd_df, unemp3, by = "date", all = FALSE)

head(ddd_df)
        date daily_vaccinations_per_million value_sum Unemployment
1 2020-12-01                       544.9167  16927034        0.067
2 2021-01-01                    123266.4194  23365835        0.063
3 2021-02-01                    304911.1429  27271116        0.062
4 2021-03-01                    429380.9032  29098434        0.060
5 2021-04-01                    514652.3333  30975088        0.061
6 2021-05-01                    319535.1290  32356593        0.058
Code
# Shape of the df
cat('The shape of this dataframe is', dim(ddd_df))
The shape of this dataframe is 16 4

2.4 Interdependence between Daily Vaccination Number and Independent Party Support Rate:

(VAR)Daily Vaccination Number ~ Independent Party Support Rate

This model evaluates how vaccination efforts and support for independent political parties influence each other over time.

Code
# Read the dataset
vac_df <- read_csv("Datasets/us_state_vaccinations.csv")

# Select relevant columns
cols_show <- c('date', 'location', 'daily_vaccinations_per_million')
t <- vac_df[, cols_show]

# Group by 'date' and summarize columns, ignoring NA values
t1 <- t %>%
  group_by(date) %>%
  summarize(
    daily_vaccinations_per_million = sum(daily_vaccinations_per_million, na.rm = TRUE)
  )

# Convert date column to Date format
t1$date <- as.Date(t1$date)

demo <- read_excel('Datasets/party.xlsx',sheet = 'Democrat')
inde <- read_excel('Datasets/party.xlsx',sheet = 'Independent')
rep <- read_excel('Datasets/party.xlsx',sheet = 'Republican')

# Transform the wide dataframe into a long dataframe
key_cols <- c("Attitude")
value_cols <- setdiff(names(demo), key_cols)
demo1 <- pivot_longer(
  demo,
  cols = value_cols,
  names_to = "date",
  values_to = "democrat"
)

inde1 <- pivot_longer(
  inde,
  cols = value_cols,
  names_to = "date",
  values_to = "independent"
)

rep1 <- pivot_longer(
  rep,
  cols = value_cols,
  names_to = "date",
  values_to = "republican"
)

# Combine these three datasets together
combined_data <- full_join(demo1, inde1, by = c("date", "Attitude")) %>%
  full_join(rep1, by = c("date", "Attitude"))
combined_data1 <- combined_data[combined_data$Attitude=='Favorable',]

# Define the key and value columns for pivoting
key_cols <- c("Attitude", "date")
value_cols <- setdiff(names(combined_data1), key_cols)

# Pivot the data from wide to long
combined_data2 <- pivot_longer(
  combined_data1,
  cols = value_cols,
  names_to = "Party",
  values_to = "value"
)

# Convert date column to Date format
combined_data2$date <- as.Date(combined_data2$date)

# Subset to each party
demo_data <- combined_data2[combined_data2$Party=='democrat',][,c("date","value")]
inde_data <- combined_data2[combined_data2$Party=='independent',][,c("date","value")]
rep_data <- combined_data2[combined_data2$Party=='republican',][,c("date","value")]

inde_df <- merge(t1, inde_data, by = "date", all = FALSE)
head(inde_df)
        date daily_vaccinations_per_million value
1 2020-12-22                            384  0.28
2 2021-01-05                           1022  0.30
3 2021-01-12                           1932  0.30
4 2021-01-19                         174090  0.32
5 2021-01-26                         214206  0.29
6 2021-02-02                         238622  0.30
Code
# Shape of the df
cat('The shape of this dataframe is', dim(inde_df))
The shape of this dataframe is 117 3

2.5 Daily Vaccination Number and Democratic Party Support Rate Dynamics:

(VAR)Daily Vaccination Number ~ Democratic Party Support Rate

Here, we assess the mutual influences between COVID-19 Daily Vaccination Numbers and support levels for the Democratic Party.

Code
demo_df <- merge(t1, demo_data, by = "date", all = FALSE)
head(demo_df)
        date daily_vaccinations_per_million value
1 2020-12-22                            384  0.86
2 2021-01-05                           1022  0.86
3 2021-01-12                           1932  0.90
4 2021-01-19                         174090  0.88
5 2021-01-26                         214206  0.91
6 2021-02-02                         238622  0.88
Code
# Shape of the df
cat('The shape of this dataframe is', dim(demo_df))
The shape of this dataframe is 117 3

2.6 The Relationship between Daily Vaccination Number and Republican Party Support Rate:

(VAR)Daily Vaccination Number ~ Republican Party Support Rate

This model explores the bidirectional influence between vaccination initiatives and Republican Party support rates.

Code
rep_df <- merge(t1, rep_data, by = "date", all = FALSE)
head(rep_df)
        date daily_vaccinations_per_million value
1 2020-12-22                            384  0.05
2 2021-01-05                           1022  0.09
3 2021-01-12                           1932  0.10
4 2021-01-19                         174090  0.09
5 2021-01-26                         214206  0.09
6 2021-02-02                         238622  0.07
Code
# Shape of the df
cat('The shape of this dataframe is', dim(rep_df))
The shape of this dataframe is 117 3

3. ARIMAX/SARIMAX

3.1 Variable Selection

To effectively construct an ARIMAX/SARIMAX model, it is crucial to select and define the response variable, which will serve as the primary focus of the analysis. Additionally, identifying relevant exogenous variables—those external predictor variables that are hypothesized to influence the response variable—is essential. This process involves a careful examination of potential predictors to ensure that they are not only relevant but also appropriately measured and free from collinearity issues. The selection of these variables should be guided by both theoretical considerations and empirical evidence, ensuring that they meaningfully contribute to the dynamics of the model. By integrating these exogenous variables, the ARIMAX/SARIMAX model can capture complex interactions and provide nuanced insights into how external factors may drive changes in the response variable over time.

Code
cava_df.ts<-ts(cava_df,star=decimal_date(as.Date("2020-12-20",format = "%Y-%m-%d")),frequency = 365.25)

autoplot(cava_df.ts[,c(2:3)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Daily changes in new confirmed case and vaccination number")

Code
deva_df.ts<-ts(deva_df,star=decimal_date(as.Date("2020-12-20",format = "%Y-%m-%d")),frequency = 365.25)

autoplot(deva_df.ts[,c(2:3)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Daily changes in dead case and new vaccination number")

Code
ddd_df.ts<-ts(ddd_df,star=decimal_date(as.Date("2020-12-01",format = "%Y-%m-%d")),frequency = 12)

autoplot(ddd_df.ts[,c(2:4)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Monthly changes in new vaccination number, confirm case, and unemployment rate")

3.2 Fit Auto.Arima()

Code
fit1 <- auto.arima(cava_df.ts[,"value_sum"],
  xreg=cava_df.ts[,"daily_vaccinations_per_million"])

summary(fit1)
Series: cava_df.ts[, "value_sum"] 
Regression with ARIMA(3,1,5) errors 

Coefficients:
          ar1     ar2     ar3     ma1      ma2      ma3     ma4     ma5
      -0.0972  0.5262  0.5203  0.2720  -0.6280  -0.6003  0.3334  0.5651
s.e.   0.0599  0.0433  0.0669  0.0517   0.0372   0.0591  0.0281  0.0311
         drift     xreg
      93299.27  -0.0061
s.e.  40830.89   0.1507

sigma^2 = 4.598e+09:  log likelihood = -10922.34
AIC=21866.69   AICc=21866.99   BIC=21919.15

Training set error measures:
                    ME    RMSE      MAE          MPE       MAPE         MASE
Training set -435.6647 67380.3 36417.53 0.0009398995 0.06462941 0.0008688438
                   ACF1
Training set 0.02955241
Code
checkresiduals(fit1)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(3,1,5) errors
Q* = 1736.4, df = 166, p-value < 2.2e-16

Model df: 8.   Total lags used: 174
Code
model_output1 <- capture.output(sarima(residuals(fit1), 3,1,5)) 

Code
cat(model_output1[76:91], model_output1[length(model_output1)], sep = "\n")
Coefficients: 
         Estimate      SE t.value p.value
ar1       -0.5252  0.2344 -2.2404  0.0253
ar2       -0.6165  0.1185 -5.2028  0.0000
ar3       -0.2279  0.1662 -1.3718  0.1705
ma1       -0.4443  0.2282 -1.9468  0.0519
ma2        0.3707  0.2879  1.2873  0.1983
ma3       -0.3128  0.2967 -1.0544  0.2920
ma4       -0.4284  0.2246 -1.9073  0.0568
ma5       -0.1851  0.0858 -2.1568  0.0313
constant  -5.9105 11.6371 -0.5079  0.6117

sigma^2 estimated as 3883502862 on 862 degrees of freedom 
 
AIC = 24.94926  AICc = 24.9495  BIC = 25.00402 
 
 
  • The Standard Residual Plot exhibits some inconsistencies in terms of mean and variance, suggesting potential issues with the model’s assumptions about homoscedasticity or linearity.

  • The Autocorrelation Function (ACF) plot reveals minimal correlation among residuals, suggesting that the model has effectively captured most of the underlying process; however, the presence of white noise indicates that the model may not be capturing all the data’s nuances, which could point to an inadequate model fit.

  • The Quantile-Quantile (Q-Q) Plot shows that the residuals are mostly aligned with a normal distribution, suggesting that the normality assumption for the errors may largely hold, which is a positive indicator for the model’s reliability.

  • Results from the Ljung-Box test show p-values below the 0.05 significance level, which typically suggests a lack of fit as it indicates autocorrelation in the residuals at lag intervals that could influence the model’s accuracy.

  • The coefficient table ($ttable) shows that the parameters ar1, ar2, ma4, and ma5 are statistically significant, highlighting their critical influence on the model’s dynamics and underscoring the importance of these terms in explaining the time series variability.

Code
fit2 <- auto.arima(deva_df.ts[,"value_sum"],
  xreg=deva_df.ts[,"daily_vaccinations_per_million"])

summary(fit2)
Series: deva_df.ts[, "value_sum"] 
Regression with ARIMA(5,2,2) errors 

Coefficients:
         ar1      ar2      ar3      ar4      ar5      ma1     ma2    xreg
      0.1645  -0.5352  -0.3586  -0.3468  -0.4067  -1.0923  0.6907  0.0005
s.e.  0.0447   0.0323   0.0329   0.0295   0.0348   0.0346  0.0517  0.0015

sigma^2 = 297676:  log likelihood = -6714.64
AIC=13447.28   AICc=13447.49   BIC=13490.2

Training set error measures:
                    ME     RMSE      MAE          MPE       MAPE        MASE
Training set -10.73691 542.4596 352.5427 -0.001237311 0.04726475 0.001078151
                   ACF1
Training set -0.0649723
Code
checkresiduals(fit2)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(5,2,2) errors
Q* = 767.24, df = 167, p-value < 2.2e-16

Model df: 7.   Total lags used: 174
Code
model_output2 <- capture.output(sarima(residuals(fit2), 5,2,2)) 

Code
cat(model_output2[77:90], model_output2[length(model_output2)], sep = "\n")
Coefficients: 
    Estimate     SE  t.value p.value
ar1  -1.5360 0.0867 -17.7142  0.0000
ar2  -1.1448 0.0897 -12.7689  0.0000
ar3  -0.8029 0.0834  -9.6288  0.0000
ar4  -0.5359 0.0736  -7.2817  0.0000
ar5  -0.1235 0.0496  -2.4886  0.0130
ma1  -0.2246 0.0776  -2.8943  0.0039
ma2  -0.7754 0.0776  -9.9942  0.0000

sigma^2 estimated as 350912.9 on 863 degrees of freedom 
 
AIC = 15.63613  AICc = 15.63628  BIC = 15.67997 
 
 
  • The Standard Residual Plot suggests some irregularities in both the mean and the variance of the residuals, pointing to potential issues in the homoscedasticity or distribution assumptions of the model.

  • The Autocorrelation Function (ACF) Plot reveals minimal correlation among residuals, suggesting that most of the systematic information in the data has been captured by the model. However, the presence of any remaining correlation, albeit slight, could indicate that the model might not be capturing all underlying patterns effectively, which is indicative of a suboptimal fit.

  • The Quantile-Quantile (Q-Q) Plot shows that the residuals are approximately normally distributed, with only minor deviations from normality. This aspect of the diagnostics is generally positive, indicating that the assumption of normality is reasonably satisfied.

  • Results from the Ljung-Box Test show p-values below the 0.05 significance level, suggesting that there is still some autocorrelation present in the residuals at lagged intervals. This result contradicts the ideal scenario where p-values would significantly exceed the threshold, confirming the absence of autocorrelation and a good fit.

  • Coefficient Significance Table: All the coefficients in the model are statistically significant, suggesting that each predictor contributes meaningfully to the model. However, significance of coefficients alone does not necessarily imply an overall effective model fit, as it does not account for potential issues in other diagnostic areas.

Code
fit3 <- auto.arima(ddd_df.ts[,"Unemployment"],
  xreg=ddd_df.ts[,2:3])

summary(fit3)
Series: ddd_df.ts[, "Unemployment"] 
Regression with ARIMA(0,0,0) errors 

Coefficients:
      intercept  daily_vaccinations_per_million  value_sum
         0.0732                           0e+00      0e+00
s.e.     0.0033                           2e-04      2e-04

sigma^2 = 1.838e-05:  log likelihood = 66.19
AIC=-124.38   AICc=-120.75   BIC=-121.29

Training set error measures:
                        ME        RMSE         MAE        MPE     MAPE
Training set -2.636216e-14 0.003864599 0.003345663 -0.4910086 7.187009
                  MASE      ACF1
Training set 0.1351783 0.6529014
Code
checkresiduals(fit3)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(0,0,0) errors
Q* = 9.5034, df = 3, p-value = 0.0233

Model df: 0.   Total lags used: 3
Code
model_output3 <- capture.output(sarima(residuals(fit3), 0,0,0)) 

Code
cat(model_output3[11:18], model_output3[length(model_output3)], sep = "\n")
Coefficients: 
      Estimate    SE t.value p.value
xmean        0 0.001       0       1

sigma^2 estimated as 1.493512e-05 on 15 degrees of freedom 
 
AIC = -8.023918  AICc = -8.006061  BIC = -7.927344 
 
 
  • The Standard Residual Plot reveals some inconsistencies in the mean and variance, suggesting potential deviations from homoscedasticity or the presence of outliers that could affect the robustness of the model predictions.

  • The Autocorrelation Function (ACF) Plot exhibits minimal correlation among the residuals, which suggests that the model has effectively captured the underlying patterns in the data, leaving behind what appears to be white noise. This is a strong indicator of a well-fitting model.

  • The Quantile-Quantile (Q-Q) Plot closely aligns with the theoretical normal distribution, underscoring the assumption of normality in the model’s residuals, which is essential for the validity of many inferential statistics.

  • The Ljung-Box Test results are mixed, with half of the p-values falling below the 0.05 significance threshold. This inconsistency suggests that some autocorrelations at different lags are significantly different from zero, indicating potential issues with the model fit.

  • Significance Table: The statistical significance of all model coefficients is affirmed, implying that each predictor contributes meaningfully to the model. This significance is crucial for understanding the impact of each variable within the model and for making informed predictions or decisions based on the model’s outputs.

3.3 Manually Fit

Code
fit.reg <- lm(value_sum ~ daily_vaccinations_per_million, data=cava_df.ts)
summary(fit.reg)

Call:
lm(formula = value_sum ~ daily_vaccinations_per_million, data = cava_df.ts)

Residuals:
      Min        1Q    Median        3Q       Max 
-67783470 -11303790   6076610  15172166  31376996 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     8.563e+07  1.032e+06   83.01   <2e-16 ***
daily_vaccinations_per_million -1.471e+02  5.458e+00  -26.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20530000 on 870 degrees of freedom
Multiple R-squared:  0.4549,    Adjusted R-squared:  0.4543 
F-statistic:   726 on 1 and 870 DF,  p-value: < 2.2e-16

The variables appear to be significant. Next, looking at the ACF/PACF plot of the residuals.

Code
res.fit1<-ts(residuals(fit.reg),start = decimal_date(as.Date("2020-12-20", format = "%Y-%m-%d")),frequency = 365.25)

plot1<-ggAcf(res.fit1,20) + theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 
plot2<- ggPacf(res.fit1,20)+theme_bw()+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

grid.arrange(plot1, plot2,nrow=2)

There appears to be seasonality still, so I will apply some differencing.

Code
plot1<-ggAcf(diff(res.fit1),20) + theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 
plot2<- ggPacf(diff(res.fit1),20)+theme_bw()+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

grid.arrange(plot1, plot2,nrow=2)

According to the plots, we have the following values: p = 1,2,3 q= 1,2,3.

Code
d=1
i=1
temp= data.frame()
ls=matrix(rep(NA,6*32),nrow=32) 


for (p in 0:3)
{
  for(q in 0:3)
  {
    for(d in 0:1)
    {
      
      if(p-1+d+q-1<=8) #usual threshold
      {
        
        model<- Arima(res.fit1/10000,order=c(p,d,q),include.drift=TRUE) 
        ls[i,]= c(p,d,q,model$aic,model$bic,model$aicc)
        i=i+1
        #print(i)
        
      }
      
    }
  }
}


temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")

#temp
knitr::kable(temp)
p d q AIC BIC AICc
0 0 0 15190.02 15204.33 15190.05
0 1 0 11253.02 11262.56 11253.04
0 0 1 14123.64 14142.72 14123.68
0 1 1 11242.93 11257.24 11242.96
0 0 2 13285.16 13309.01 13285.23
0 1 2 11222.57 11241.65 11222.61
0 0 3 12809.76 12838.38 12809.86
0 1 3 11219.64 11243.49 11219.71
1 0 0 11273.37 11292.46 11273.42
1 1 0 11239.06 11253.37 11239.08
1 0 1 11263.17 11287.03 11263.24
1 1 1 11222.22 11241.30 11222.27
1 0 2 11242.16 11270.78 11242.25
1 1 2 11216.68 11240.53 11216.75
1 0 3 11238.93 11272.32 11239.05
1 1 3 11218.54 11247.16 11218.63
2 0 0 11258.99 11282.84 11259.06
2 1 0 11218.21 11237.29 11218.26
2 0 1 11240.73 11269.36 11240.83
2 1 1 11217.28 11241.13 11217.35
2 0 2 11235.32 11268.71 11235.45
2 1 2 11208.17 11236.79 11208.27
2 0 3 11237.17 11275.33 11237.33
2 1 3 11220.62 11254.01 11220.75
3 0 0 11237.31 11265.93 11237.41
3 1 0 11216.51 11240.36 11216.58
3 0 1 11235.72 11269.12 11235.85
3 1 1 11218.42 11247.03 11218.51
3 0 2 11228.83 11267.00 11229.00
3 1 2 11204.65 11238.04 11204.78
3 0 3 11229.98 11272.92 11230.19
3 1 3 11188.50 11226.66 11188.67
Code
temp[which.min(temp$AIC),] 
   p d q     AIC      BIC     AICc
32 3 1 3 11188.5 11226.66 11188.67
Code
temp[which.min(temp$BIC),]
   p d q     AIC      BIC     AICc
32 3 1 3 11188.5 11226.66 11188.67
Code
temp[which.min(temp$AICc),]
   p d q     AIC      BIC     AICc
32 3 1 3 11188.5 11226.66 11188.67

This gives me a Regression with model ARIMA(3,1,3).

Code
model_output1 <- capture.output(sarima(res.fit1, 3,1,3)) 

Code
cat(model_output1[73:86], model_output1[length(model_output1)], sep = "\n")
Coefficients: 
           Estimate         SE t.value p.value
ar1         -0.1960     0.0844 -2.3226  0.0204
ar2         -0.4063     0.0721 -5.6334  0.0000
ar3          0.6487     0.0821  7.8979  0.0000
ma1          0.3397     0.0975  3.4825  0.0005
ma2          0.5956     0.0829  7.1818  0.0000
ma3         -0.4884     0.0956 -5.1106  0.0000
constant 98615.9324 75690.4242  1.3029  0.1930

sigma^2 estimated as 2.162845e+12 on 864 degrees of freedom 
 
AIC = 31.26626  AICc = 31.26641  BIC = 31.31007 
 
 
  • The Standard Residual Plot exhibits some inconsistencies in terms of mean and variance, suggesting potential issues with the model’s assumptions about homoscedasticity or linearity.

  • The Autocorrelation Function (ACF) Plot exhibits minimal correlation among the residuals, which suggests that the model has effectively captured the underlying patterns in the data, leaving behind what appears to be white noise. This is a strong indicator of a well-fitting model.

  • The Quantile-Quantile (Q-Q) Plot shows that the residuals are mostly aligned with a normal distribution, suggesting that the normality assumption for the errors may largely hold, which is a positive indicator for the model’s reliability.

  • Results from the Ljung-Box test show p-values below the 0.05 significance level, which typically suggests a lack of fit as it indicates autocorrelation in the residuals at lag intervals that could influence the model’s accuracy.

  • Significance Table: The statistical significance of all model coefficients is affirmed, implying that each predictor contributes meaningfully to the model. This significance is crucial for understanding the impact of each variable within the model and for making informed predictions or decisions based on the model’s outputs.

Code
fit.reg <- lm(value_sum ~ daily_vaccinations_per_million, data=deva_df.ts)
summary(fit.reg)

Call:
lm(formula = value_sum ~ daily_vaccinations_per_million, data = deva_df.ts)

Residuals:
    Min      1Q  Median      3Q     Max 
-673660  -81409   69407  113393  222586 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     9.980e+05  8.768e+03  113.82   <2e-16 ***
daily_vaccinations_per_million -1.120e+00  4.639e-02  -24.15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 174500 on 870 degrees of freedom
Multiple R-squared:  0.4014,    Adjusted R-squared:  0.4007 
F-statistic: 583.4 on 1 and 870 DF,  p-value: < 2.2e-16

The variables appear to be significant. Next, looking at the ACF/PACF plot of the residuals.

Code
res.fit1<-ts(residuals(fit.reg),start = decimal_date(as.Date("2020-12-20", format = "%Y-%m-%d")),frequency = 365.25)

plot1<-ggAcf(res.fit1,20) + theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 
plot2<- ggPacf(res.fit1,20)+theme_bw()+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

grid.arrange(plot1, plot2,nrow=2)

There appears to be seasonality still, so I will apply some differencing.

Code
plot1<-ggAcf(diff(res.fit1),20) + theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 
plot2<- ggPacf(diff(res.fit1),20)+theme_bw()+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

grid.arrange(plot1, plot2,nrow=2)

According to the plots, we have the following values: p = 1,2,3 q= 1,2,3.

Code
d=1
i=1
temp= data.frame()
ls=matrix(rep(NA,6*32),nrow=32) 


for (p in 0:3)
{
  for(q in 0:3)
  {
    for(d in 0:1)
    {
      
      if(p-1+d+q-1<=8) #usual threshold
      {
        
        model<- Arima(res.fit1/10000,order=c(p,d,q),include.drift=TRUE) 
        ls[i,]= c(p,d,q,model$aic,model$bic,model$aicc)
        i=i+1
        #print(i)
        
      }
      
    }
  }
}


temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")

#temp
knitr::kable(temp)
p d q AIC BIC AICc
0 0 0 6884.051 6898.364 6884.079
0 1 0 2772.386 2781.925 2772.400
0 0 1 5806.323 5825.406 5806.369
0 1 1 2760.379 2774.688 2760.407
0 0 2 4940.384 4964.238 4940.453
0 1 2 2738.328 2757.406 2738.374
0 0 3 4458.706 4487.331 4458.803
0 1 3 2734.816 2758.664 2734.885
1 0 0 2784.017 2803.100 2784.063
1 1 0 2755.694 2770.003 2755.722
1 0 1 2772.545 2796.399 2772.614
1 1 1 2735.762 2754.840 2735.808
1 0 2 2749.551 2778.176 2749.648
1 1 2 2730.846 2754.694 2730.915
1 0 3 2746.060 2779.455 2746.189
1 1 3 2732.765 2761.383 2732.862
2 0 0 2767.347 2791.201 2767.416
2 1 0 2732.869 2751.948 2732.916
2 0 1 2741.094 2769.718 2741.191
2 1 1 2731.293 2755.141 2731.363
2 0 2 2768.397 2801.793 2768.527
2 1 2 2725.615 2754.233 2725.712
2 0 3 2744.072 2782.238 2744.238
2 1 3 2734.797 2768.185 2734.927
3 0 0 2743.763 2772.388 2743.860
3 1 0 2730.685 2754.533 2730.754
3 0 1 2742.232 2775.628 2742.362
3 1 1 2732.659 2761.277 2732.757
3 0 2 2746.984 2785.150 2747.151
3 1 2 2734.658 2768.046 2734.788
3 0 3 2751.145 2794.082 2751.354
3 1 3 2702.731 2740.888 2702.898
Code
temp[which.min(temp$AIC),] 
   p d q      AIC      BIC     AICc
32 3 1 3 2702.731 2740.888 2702.898
Code
temp[which.min(temp$BIC),]
   p d q      AIC      BIC     AICc
32 3 1 3 2702.731 2740.888 2702.898
Code
temp[which.min(temp$AICc),]
   p d q      AIC      BIC     AICc
32 3 1 3 2702.731 2740.888 2702.898

This gives me a Regression with model ARIMA(3,1,3).

Code
model_output1 <- capture.output(sarima(res.fit1, 3,1,3)) 

Code
cat(model_output1[77:90], model_output1[length(model_output1)], sep = "\n")
Coefficients: 
         Estimate       SE t.value p.value
ar1       -0.1644   0.0803 -2.0482  0.0408
ar2       -0.3805   0.0685 -5.5579  0.0000
ar3        0.6786   0.0780  8.7049  0.0000
ma1        0.3127   0.0951  3.2866  0.0011
ma2        0.5740   0.0806  7.1185  0.0000
ma3       -0.5148   0.0933 -5.5190  0.0000
constant 947.2074 603.9139  1.5684  0.1171

sigma^2 estimated as 127026096 on 864 degrees of freedom 
 
AIC = 21.5237  AICc = 21.52385  BIC = 21.56751 
 
 
  • The Standard Residual Plot exhibits some inconsistencies in terms of mean and variance, suggesting potential issues with the model’s assumptions about homoscedasticity or linearity.

  • The Autocorrelation Function (ACF) Plot exhibits minimal correlation among the residuals, which suggests that the model has effectively captured the underlying patterns in the data, leaving behind what appears to be white noise. This is a strong indicator of a well-fitting model.

  • The Quantile-Quantile (Q-Q) Plot shows that the residuals are mostly aligned with a normal distribution, suggesting that the normality assumption for the errors may largely hold, which is a positive indicator for the model’s reliability.

  • Results from the Ljung-Box test show p-values below the 0.05 significance level, which typically suggests a lack of fit as it indicates autocorrelation in the residuals at lag intervals that could influence the model’s accuracy.

  • Significance Table: The statistical significance of all model coefficients is affirmed, implying that each predictor contributes meaningfully to the model. This significance is crucial for understanding the impact of each variable within the model and for making informed predictions or decisions based on the model’s outputs.

Code
fit.reg <- lm(Unemployment ~ daily_vaccinations_per_million + value_sum, data=ddd_df.ts)
summary(fit.reg)

Call:
lm(formula = Unemployment ~ daily_vaccinations_per_million + 
    value_sum, data = ddd_df.ts)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.008137 -0.002539  0.001572  0.002869  0.004882 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     7.323e-02  3.660e-03  20.008 3.78e-11 ***
daily_vaccinations_per_million  1.333e-09  8.421e-09   0.158    0.877    
value_sum                      -5.306e-10  6.405e-11  -8.283 1.52e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.004287 on 13 degrees of freedom
Multiple R-squared:  0.8501,    Adjusted R-squared:  0.827 
F-statistic: 36.86 on 2 and 13 DF,  p-value: 4.396e-06

The confirmed COVID-19 case number appears to be significant. Next, looking at the ACF/PACF plot of the residuals.

Code
res.fit1<-ts(residuals(fit.reg),start = decimal_date(as.Date("2020-12-01", format = "%Y-%m-%d")),frequency = 12)

plot1<-ggAcf(res.fit1) + theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 
plot2<- ggPacf(res.fit1)+theme_bw()+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

grid.arrange(plot1, plot2,nrow=2)

According to the plots, we have the following values: p = 1 q= 1.

Code
d=1
i=1
temp= data.frame()
ls=matrix(rep(NA,6*8),nrow=8) 


for (p in 0:1)
{
  for(q in 0:1)
  {
    for(d in 0:1)
    {
      
      if(p-1+d+q-1<=8) #usual threshold
      {
        
        model<- Arima(res.fit1/10000,order=c(p,d,q),include.drift=TRUE) 
        ls[i,]= c(p,d,q,model$aic,model$bic,model$aicc)
        i=i+1
        #print(i)
        
      }
      
    }
  }
}


temp= as.data.frame(ls)
names(temp)= c("p","d","q","AIC","BIC","AICc")

#temp
knitr::kable(temp)
p d q AIC BIC AICc
0 0 0 -422.7689 -420.4511 -420.7689
0 1 0 -403.1358 -401.7197 -402.1358
0 0 1 -427.6066 -424.5162 -423.9702
0 1 1 -401.2666 -399.1424 -399.0848
1 0 0 -428.8463 -425.7560 -425.2100
1 1 0 -401.2515 -399.1273 -399.0697
1 0 1 -427.8426 -423.9797 -421.8426
1 1 1 -399.2710 -396.4388 -395.2710
Code
temp[which.min(temp$AIC),] 
  p d q       AIC      BIC    AICc
5 1 0 0 -428.8463 -425.756 -425.21
Code
temp[which.min(temp$BIC),]
  p d q       AIC      BIC    AICc
5 1 0 0 -428.8463 -425.756 -425.21
Code
temp[which.min(temp$AICc),]
  p d q       AIC      BIC    AICc
5 1 0 0 -428.8463 -425.756 -425.21

This gives me a Regression with model ARIMA(1,0,0).

Code
model_output1 <- capture.output(sarima(res.fit1, 1,0,0)) 

Code
cat(model_output1[21:29], model_output1[length(model_output1)], sep = "\n")
Coefficients: 
      Estimate     SE t.value p.value
ar1     0.6799 0.1747  3.8916  0.0016
xmean   0.0007 0.0020  0.3457  0.7347

sigma^2 estimated as 7.862675e-06 on 14 degrees of freedom 
 
AIC = -8.501737  AICc = -8.444045  BIC = -8.356877 
 
 
  • The Standard Residual Plot exhibits some inconsistencies in terms of mean and variance, suggesting potential issues with the model’s assumptions about homoscedasticity or linearity.

  • The Autocorrelation Function (ACF) Plot exhibits minimal correlation among the residuals, which suggests that the model has effectively captured the underlying patterns in the data, leaving behind what appears to be white noise. This is a strong indicator of a well-fitting model.

  • The Quantile-Quantile (Q-Q) Plot shows that the residuals are mostly aligned with a normal distribution, suggesting that the normality assumption for the errors may largely hold, which is a positive indicator for the model’s reliability.

  • Results from the Ljung-Box test show p-values above the 0.05 significance level, which typically suggests a good fit as it indicates autocorrelation in the residuals at lag intervals.

  • Significance Table: The statistical significance of all model coefficients is affirmed, implying that each predictor contributes meaningfully to the model. This significance is crucial for understanding the impact of each variable within the model and for making informed predictions or decisions based on the model’s outputs.

3.4 Cross Validation

To confirm our selection on the model, we choose to use cross validation to compare the RMSE values to help us make decision.

Auto.arima gave me ARIMA(3,1,5), while the manual model selection resulted in ARIMA(3,1,3).

Code
fit.reg <- lm(value_sum ~ daily_vaccinations_per_million, data=cava_df.ts)
res.fit1<-ts(residuals(fit.reg),start = decimal_date(as.Date("2020-12-20", format = "%Y-%m-%d")),frequency = 365.25)

n = length(res.fit1) # 872
k = 220

i=1
err1 = c()
err2 = c()

for(i in 1:(n-k))
{
  xtrain <- res.fit1[1:(k-1)+i] #observations from 1 to 12
  xtest <- res.fit1[k+i] #13th observation as the test set
  
  fit1 <- Arima(xtrain, order=c(3,1,5),include.drift=FALSE, method="ML")
  fcast1 <- forecast(fit1, h=1)
  
  fit2 <- Arima(xtrain, order=c(3,1,3),include.drift=FALSE, method="ML")
  fcast2 <- forecast(fit2, h=1)

  
  #capture error for each iteration
  # This is mean absolute error
  err1 = c(err1, abs(fcast1$mean-xtest)) 
  err2 = c(err2, abs(fcast2$mean-xtest))
  
  # This is mean squared error
  err3 = c(err1, (fcast1$mean-xtest)^2)
  err4 = c(err2, (fcast2$mean-xtest)^2)
  #print(i)
}

RMSE1 <- sqrt(mean(err3, na.rm = TRUE))
RMSE2 <- sqrt(mean(err4, na.rm = TRUE))

cat('The RMSE of Auto Arima is', RMSE1, '\n')
The RMSE of Auto Arima is 951.6204 
Code
cat('The RMSE of Manually Selection is', RMSE2, '\n')
The RMSE of Manually Selection is 1930.114 

The Auto Arima model had lower RMSE compared to the manual model. Therefore, the best model is ARIMA(3,1,5).

Auto.arima gave me ARIMA(5,2,2), while the manual model selection resulted in ARIMA(3,1,3).

Code
fit.reg <- lm(value_sum ~ daily_vaccinations_per_million, data=deva_df.ts)
res.fit1<-ts(residuals(fit.reg),start = decimal_date(as.Date("2020-12-20", format = "%Y-%m-%d")),frequency = 365.25)

n = length(res.fit1) # 872
k = 220

i=1
err1 = c()
err2 = c()

for(i in 1:(n-k))
{
  xtrain <- res.fit1[1:(k-1)+i] #observations from 1 to 12
  xtest <- res.fit1[k+i] #13th observation as the test set
  
  fit1 <- Arima(xtrain, order=c(5,2,2),include.drift=FALSE, method="ML")
  fcast1 <- forecast(fit1, h=1)
  
  fit2 <- Arima(xtrain, order=c(3,1,3),include.drift=FALSE, method="ML")
  fcast2 <- forecast(fit2, h=1)

  
  #capture error for each iteration
  # This is mean absolute error
  err1 = c(err1, abs(fcast1$mean-xtest)) 
  err2 = c(err2, abs(fcast2$mean-xtest))
  
  # This is mean squared error
  err3 = c(err1, (fcast1$mean-xtest)^2)
  err4 = c(err2, (fcast2$mean-xtest)^2)
  #print(i)
}

RMSE1 <- sqrt(mean(err3, na.rm = TRUE))
RMSE2 <- sqrt(mean(err4, na.rm = TRUE))

cat('The RMSE of Auto Arima is', RMSE1, '\n')
The RMSE of Auto Arima is 57.65793 
Code
cat('The RMSE of Manually Selection is', RMSE2, '\n')
The RMSE of Manually Selection is 59.52566 

The manual model had lower RMSE compared to the Auto Arima model. Therefore, the best model is ARIMA(3,1,3).

Auto.arima gave me ARIMA(0,0,0), while the manual model selection resulted in ARIMA(1,0,0). Since the data sample for this time series only contains 16 observations, we choose to use ARIMA(1,0,0) since ARIMA(0,0,0) contains more biases

3.5 Model Fitting

In this section, we will use the best model to fit the model.

The best ARIMAX model for this relationship is ARIMA(3,1,5).

Code
fit1 <- Arima(cava_df.ts[,"value_sum"],order=c(3,1,5), xreg=cava_df.ts[,"daily_vaccinations_per_million"])
summary(fit1)
Series: cava_df.ts[, "value_sum"] 
Regression with ARIMA(3,1,5) errors 

Coefficients:
          ar1    ar2     ar3     ma1      ma2      ma3     ma4     ma5     xreg
      -0.0956  0.533  0.5298  0.2741  -0.6293  -0.6067  0.3318  0.5648  -0.0029
s.e.   0.0594  0.043  0.0667  0.0513   0.0371   0.0584  0.0280  0.0308   0.1508

sigma^2 = 4.609e+09:  log likelihood = -10924.09
AIC=21868.18   AICc=21868.44   BIC=21915.88

Training set error measures:
                   ME     RMSE      MAE         MPE       MAPE         MASE
Training set 2872.186 67499.31 36164.96 0.007343933 0.06440783 0.0008628181
                   ACF1
Training set 0.02757954

Equation:

\[ \nabla Y_t = \mu - 0.0956 \nabla Y_{t-1} + 0.533 \nabla Y_{t-2} + 0.5298 \nabla Y_{t-3} + 0.2741 \epsilon_{t-1} - 0.6293 \epsilon_{t-2} - 0.6067 \epsilon_{t-3} + 0.3318 \epsilon_{t-4} + 0.5648 \epsilon_{t-5} - 0.0029 X_t + \epsilon_t \]

The best ARIMAX model for this relationship is ARIMA(3,1,3).

Code
fit1 <- Arima(deva_df.ts[,"value_sum"],order=c(3,1,3), xreg=deva_df.ts[,"daily_vaccinations_per_million"])
summary(fit1)
Series: deva_df.ts[, "value_sum"] 
Regression with ARIMA(3,1,3) errors 

Coefficients:
         ar1      ar2     ar3      ma1     ma2      ma3    xreg
      0.5142  -0.4725  0.9553  -0.3085  0.4341  -0.8067  0.0018
s.e.  0.0153   0.0215  0.0156   0.0260  0.0268   0.0195  0.0019

sigma^2 = 378948:  log likelihood = -6828.51
AIC=13673.02   AICc=13673.19   BIC=13711.18

Training set error measures:
                    ME     RMSE      MAE           MPE       MAPE       MASE
Training set -10.57903 612.7571 411.9691 -0.0006127656 0.05651456 0.00125989
                   ACF1
Training set 0.05821364

Equation:

\[ \nabla Y_t = \mu + 0.5142 \nabla Y_{t-1} - 0.4725 \nabla Y_{t-2} + 0.9553 \nabla Y_{t-3} - 0.3085 \epsilon_{t-1} + 0.4341 \epsilon_{t-2} - 0.8067 \epsilon_{t-3} + 0.0018 X_t + \epsilon_t \]

The best ARIMAX model for this relationship is ARIMA(1,0,0).

Code
fit1 <- Arima(ddd_df.ts[,"Unemployment"],order=c(1,0,0), xreg=ddd_df.ts[,2:3])
summary(fit1)
Series: ddd_df.ts[, "Unemployment"] 
Regression with ARIMA(1,0,0) errors 

Coefficients:
        ar1  intercept  daily_vaccinations_per_million  value_sum
      0.897      0.066                               0          0
s.e.    NaN        NaN                             NaN        NaN

sigma^2 = 8.212e-06:  log likelihood = 72.46
AIC=-134.92   AICc=-128.92   BIC=-131.06

Training set error measures:
                        ME        RMSE         MAE      MPE     MAPE       MASE
Training set -0.0005780741 0.002481684 0.002158275 -1.36368 4.464085 0.08720304
                    ACF1
Training set -0.08143333

Equation:

\[ \nabla Y_t = 0.066 + 0.897 \nabla Y_{t-1} + \epsilon_t \]

3.6 Forecast

Then, I use the best model for each relationship and forecast the values.

Code
fit1 <- Arima(cava_df.ts[,"value_sum"],order=c(3,1,5), xreg=cava_df.ts[,"daily_vaccinations_per_million"])
forecast(fit1, h = 100, xreg=cava_df.ts[,"daily_vaccinations_per_million"]) %>%
  autoplot() + xlab("Date")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

In summary, this forecast suggests that, according to the ARIMA(3,1,5) model used, the daily confirmed cases are expected to remain at a consistent level in the near term, with the uncertainty of the forecast increasing as it projects further out into the future.

Code
fit1 <- Arima(deva_df.ts[,"value_sum"],order=c(3,1,3), xreg=deva_df.ts[,"daily_vaccinations_per_million"])
forecast(fit1, h = 100, xreg=deva_df.ts[,"daily_vaccinations_per_million"]) %>%
  autoplot() + xlab("Date")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

In summary, this forecast suggests that, according to the ARIMA(3,1,3) model used, the daily dead cases are expected to remain at a consistent level in the near term, with the uncertainty of the forecast increasing as it projects further out into the future.

Code
fit1 <- Arima(ddd_df.ts[,"Unemployment"],order=c(1,0,0), xreg=ddd_df.ts[,2:3])
forecast(fit1, h = 4, xreg=ddd_df.ts[,2:3]) %>%
  autoplot() + xlab("Month")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA)) 

In summary, the model seems to predict that the decreasing trend in unemployment observed in the actual data will not persist in the same way, and that unemployment may stabilize at the levels seen at the end of the observed data.

4. VAR

4.1 Variable Selection

When crafting a VAR model, it is pivotal to delineate each variable that will be considered within the system. Unlike ARIMAX/SARIMAX models, which focus on a primary response variable influenced by exogenous predictors, VAR models treat all included variables as endogenously interrelated. Each variable in a VAR model is explained by its own lagged values as well as the lagged values of all other variables in the system, allowing for a multidirectional analysis of influence and response over time.

The selection process for variables in a VAR model is underpinned by both theoretical frameworks and empirical data, necessitating a rigorous review to ascertain their interconnectedness and the absence of unit roots that may indicate non-stationarity. This ensures that each variable not only possesses inherent relevance but also contributes to a comprehensive understanding of the system’s internal dynamics. By employing a VAR model, one can dissect and comprehend the nuanced interplay between variables, enabling a holistic view of how each variable evolves in relation to others within the dataset.

Code
inde_df.ts<-ts(inde_df,star=decimal_date(as.Date("2020-12-22",format = "%Y-%m-%d")),frequency = 52)

autoplot(inde_df.ts[,c(2:3)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Daily changes in new vaccination number and independent party support rate")

Code
demo_df.ts<-ts(demo_df,star=decimal_date(as.Date("2020-12-22",format = "%Y-%m-%d")),frequency = 52)

autoplot(demo_df.ts[,c(2:3)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Daily changes in new vaccination number and democratic party support rate")

Code
rep_df.ts<-ts(rep_df,star=decimal_date(as.Date("2020-12-22",format = "%Y-%m-%d")),frequency = 52)

autoplot(rep_df.ts[,c(2:3)], facets=TRUE, color="#5a3196") +
  xlab("Year") + ylab("") + theme_bw() +
  ggtitle("Daily changes in new vaccination number and republican party support rate")

4.2 Variable Selection

In the process of refining our VAR model, a critical step involves selecting the optimal number of lag periods, denoted as ( p ). This selection is pivotal as it significantly influences the model’s accuracy and effectiveness in capturing the dynamics between the time series variables. To determine the best ( p ), we utilize criteria such as the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and the Hannan-Quinn Information Criterion (HQIC), which help in balancing model complexity and goodness of fit.

Code
VARselect(inde_df.ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     3      1      1      3 

$criteria
                  1            2            3            4            5
AIC(n) 1.480094e+01 1.475230e+01 1.474972e+01 1.479836e+01 1.486156e+01
HQ(n)  1.495284e+01 1.499533e+01 1.508389e+01 1.522367e+01 1.537801e+01
SC(n)  1.517564e+01 1.535181e+01 1.557405e+01 1.584751e+01 1.613553e+01
FPE(n) 2.679516e+06 2.553912e+06 2.550776e+06 2.684193e+06 2.869670e+06
                  6            7            8            9           10
AIC(n) 1.482814e+01 1.493031e+01 1.505620e+01 1.509082e+01 1.516958e+01
HQ(n)  1.543572e+01 1.562903e+01 1.584606e+01 1.597182e+01 1.614172e+01
SC(n)  1.632692e+01 1.665391e+01 1.700461e+01 1.726405e+01 1.756763e+01
FPE(n) 2.789790e+06 3.111892e+06 3.562422e+06 3.732117e+06 4.098560e+06

According to the selection criterion, we select the model with p=1 and 3.

Code
summary(vars::VAR(inde_df.ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 116 
Log Likelihood: -1356.754 
Roots of the characteristic polynomial:
0.914 0.8901 0.3618
Call:
vars::VAR(y = inde_df.ts, p = 1, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                   Estimate Std. Error t value Pr(>|t|)    
date.l1                           9.030e-01  4.630e-02  19.505   <2e-16 ***
daily_vaccinations_per_million.l1 1.724e-08  1.882e-06   0.009   0.9927    
value.l1                          2.129e+00  5.309e+00   0.401   0.6891    
const                             1.811e+03  8.609e+02   2.103   0.0377 *  
trend                             7.207e-01  3.417e-01   2.109   0.0372 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.859 on 111 degrees of freedom
Multiple R-Squared: 0.9999, Adjusted R-squared: 0.9999 
F-statistic: 5.17e+05 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                           -6.894e+02  8.357e+02  -0.825  0.41120    
daily_vaccinations_per_million.l1  8.705e-01  3.397e-02  25.624  < 2e-16 ***
value.l1                           2.724e+05  9.583e+04   2.842  0.00533 ** 
const                              1.280e+07  1.554e+07   0.824  0.41196    
trend                              4.604e+03  6.167e+03   0.746  0.45698    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 33560 on 111 degrees of freedom
Multiple R-Squared: 0.9339, Adjusted R-squared: 0.9315 
F-statistic: 391.8 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            1.959e-03  7.669e-04   2.555   0.0120 *  
daily_vaccinations_per_million.l1  4.250e-08  3.117e-08   1.363   0.1755    
value.l1                           3.924e-01  8.794e-02   4.462 1.95e-05 ***
const                             -3.631e+01  1.426e+01  -2.546   0.0123 *  
trend                             -1.427e-02  5.659e-03  -2.521   0.0131 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.03079 on 111 degrees of freedom
Multiple R-Squared: 0.3111, Adjusted R-squared: 0.2863 
F-statistic: 12.53 on 4 and 111 DF,  p-value: 1.895e-08 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            3.456e+00                     -1.884e+04
daily_vaccinations_per_million -1.884e+04                      1.126e+09
value                           6.952e-03                     -3.151e+01
                                    value
date                            6.952e-03
daily_vaccinations_per_million -3.151e+01
value                           9.483e-04

Correlation matrix of residuals:
                                  date daily_vaccinations_per_million    value
date                            1.0000                       -0.30194  0.12144
daily_vaccinations_per_million -0.3019                        1.00000 -0.03049
value                           0.1214                       -0.03049  1.00000
Code
summary(vars::VAR(inde_df.ts, p=3, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 114 
Log Likelihood: -1301.287 
Roots of the characteristic polynomial:
0.9039 0.9039 0.773 0.4963 0.4963 0.4696 0.4696 0.4464 0.435
Call:
vars::VAR(y = inde_df.ts, p = 3, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            7.913e-01  1.008e-01   7.853 4.05e-12 ***
daily_vaccinations_per_million.l1 -5.850e-06  5.182e-06  -1.129   0.2616    
value.l1                          -3.134e+00  5.909e+00  -0.530   0.5970    
date.l2                            1.998e-01  1.256e-01   1.590   0.1149    
daily_vaccinations_per_million.l2  4.061e-06  7.061e-06   0.575   0.5665    
value.l2                           8.315e+00  5.763e+00   1.443   0.1521    
date.l3                           -8.653e-02  9.656e-02  -0.896   0.3723    
daily_vaccinations_per_million.l3  4.690e-06  4.936e-06   0.950   0.3443    
value.l3                          -1.275e+00  5.978e+00  -0.213   0.8316    
const                              1.781e+03  9.125e+02   1.952   0.0537 .  
trend                              7.192e-01  3.613e-01   1.990   0.0492 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.73 on 103 degrees of freedom
Multiple R-Squared:     1,  Adjusted R-squared:     1 
F-statistic: 2.27e+05 on 10 and 103 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            9.748e+02  1.864e+03   0.523  0.60211    
daily_vaccinations_per_million.l1  9.188e-01  9.585e-02   9.585 6.18e-16 ***
value.l1                           1.874e+05  1.093e+05   1.715  0.08943 .  
date.l2                            1.743e+03  2.324e+03   0.750  0.45488    
daily_vaccinations_per_million.l2  1.470e-01  1.306e-01   1.125  0.26303    
value.l2                           7.362e+04  1.066e+05   0.691  0.49137    
date.l3                           -3.968e+03  1.786e+03  -2.222  0.02848 *  
daily_vaccinations_per_million.l3 -2.530e-01  9.131e-02  -2.771  0.00663 ** 
value.l3                           7.192e+04  1.106e+05   0.650  0.51690    
const                              2.319e+07  1.688e+07   1.374  0.17237    
trend                              8.562e+03  6.683e+03   1.281  0.20300    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 32000 on 103 degrees of freedom
Multiple R-Squared: 0.943,  Adjusted R-squared: 0.9375 
F-statistic: 170.4 on 10 and 103 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)   
date.l1                            4.375e-03  1.641e-03   2.666  0.00892 **
daily_vaccinations_per_million.l1  1.207e-07  8.440e-08   1.431  0.15557   
value.l1                           1.576e-01  9.624e-02   1.638  0.10448   
date.l2                           -1.759e-03  2.046e-03  -0.860  0.39201   
daily_vaccinations_per_million.l2 -2.634e-08  1.150e-07  -0.229  0.81928   
value.l2                           3.130e-01  9.386e-02   3.335  0.00119 **
date.l3                           -1.981e-03  1.573e-03  -1.259  0.21072   
daily_vaccinations_per_million.l3 -7.508e-08  8.040e-08  -0.934  0.35255   
value.l3                           1.861e-01  9.737e-02   1.911  0.05873 . 
const                             -1.178e+01  1.486e+01  -0.793  0.42971   
trend                             -4.541e-03  5.885e-03  -0.772  0.44212   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02818 on 103 degrees of freedom
Multiple R-Squared: 0.4611, Adjusted R-squared: 0.4088 
F-statistic: 8.814 on 10 and 103 DF,  p-value: 2.602e-10 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            2.993e+00                     -1.450e+04
daily_vaccinations_per_million -1.450e+04                      1.024e+09
value                           3.949e-03                     -8.783e+01
                                    value
date                            3.949e-03
daily_vaccinations_per_million -8.783e+01
value                           7.938e-04

Correlation matrix of residuals:
                                   date daily_vaccinations_per_million    value
date                            1.00000                       -0.26203  0.08103
daily_vaccinations_per_million -0.26203                        1.00000 -0.09742
value                           0.08103                       -0.09742  1.00000

Examining the VAR models with ( p = 1 ) and ( p = 3 ) lag orders, we notice the presence of some variables that do not significantly contribute to the model. Notably, the model with ( p = 3 ) lag periods tends to have fewer significant variables compared to the ( p = 1 ) model. However, opting for ( p = 1 ) may overly simplify the model, potentially failing to capture all the relevant and significant relationships among the variables. Thus, while ( p = 1 ) offers a more parsimonious model, it might not sufficiently account for the complexity of the dynamics within the data, suggesting that a balance must be struck between model simplicity and its ability to elucidate significant interactions.

Code
VARselect(demo_df.ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     6      1      1      6 

$criteria
                  1            2            3            4            5
AIC(n) 1.438534e+01 1.440756e+01 1.431920e+01 1.433737e+01 1.427389e+01
HQ(n)  1.453724e+01 1.465060e+01 1.465337e+01 1.476268e+01 1.479034e+01
SC(n)  1.476004e+01 1.500708e+01 1.514353e+01 1.538652e+01 1.554786e+01
FPE(n) 1.768330e+06 1.809214e+06 1.658430e+06 1.692801e+06 1.594443e+06
                  6            7            8            9           10
AIC(n) 1.422535e+01 1.426444e+01 1.438205e+01 1.445555e+01 1.448973e+01
HQ(n)  1.483294e+01 1.496317e+01 1.517191e+01 1.533655e+01 1.546186e+01
SC(n)  1.572414e+01 1.598804e+01 1.633046e+01 1.662879e+01 1.688778e+01
FPE(n) 1.526816e+06 1.598976e+06 1.815374e+06 1.977256e+06 2.076702e+06

According to the selection criterion, we select the model with p=1 and 6.

Code
summary(vars::VAR(demo_df.ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 116 
Log Likelihood: -1326.706 
Roots of the characteristic polynomial:
0.9116 0.868 0.4189
Call:
vars::VAR(y = demo_df.ts, p = 1, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            9.152e-01  4.362e-02  20.982   <2e-16 ***
daily_vaccinations_per_million.l1  3.004e-07  1.856e-06   0.162   0.8717    
value.l1                          -3.985e+00  6.743e+00  -0.591   0.5557    
const                              1.588e+03  8.105e+02   1.959   0.0526 .  
trend                              6.317e-01  3.224e-01   1.959   0.0526 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.857 on 111 degrees of freedom
Multiple R-Squared: 0.9999, Adjusted R-squared: 0.9999 
F-statistic: 5.178e+05 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                           -2.499e+02  7.848e+02  -0.318  0.75077    
daily_vaccinations_per_million.l1  8.776e-01  3.340e-02  26.278  < 2e-16 ***
value.l1                           3.652e+05  1.213e+05   3.011  0.00323 ** 
const                              4.376e+06  1.458e+07   0.300  0.76469    
trend                              1.412e+03  5.801e+03   0.243  0.80807    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 33420 on 111 degrees of freedom
Multiple R-Squared: 0.9344, Adjusted R-squared: 0.932 
F-statistic: 395.3 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            9.846e-04  5.604e-04   1.757   0.0817 .  
daily_vaccinations_per_million.l1 -6.079e-09  2.385e-08  -0.255   0.7993    
value.l1                           4.058e-01  8.662e-02   4.684 8.02e-06 ***
const                             -1.780e+01  1.041e+01  -1.710   0.0901 .  
trend                             -7.294e-03  4.142e-03  -1.761   0.0810 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02386 on 111 degrees of freedom
Multiple R-Squared: 0.2103, Adjusted R-squared: 0.1819 
F-statistic: 7.391 on 4 and 111 DF,  p-value: 2.576e-05 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            3.450e+00                     -1.720e+04
daily_vaccinations_per_million -1.720e+04                      1.117e+09
value                          -8.675e-04                      1.326e+02
                                    value
date                           -8.675e-04
daily_vaccinations_per_million  1.326e+02
value                           5.695e-04

Correlation matrix of residuals:
                                   date daily_vaccinations_per_million    value
date                            1.00000                        -0.2771 -0.01957
daily_vaccinations_per_million -0.27709                         1.0000  0.16632
value                          -0.01957                         0.1663  1.00000
Code
summary(vars::VAR(demo_df.ts, p=6, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 111 
Log Likelihood: -1206.123 
Roots of the characteristic polynomial:
0.9774 0.9451 0.9451 0.8051 0.8051 0.7966 0.7966 0.7777 0.748 0.748 0.7465 0.7465 0.7393 0.7393 0.5313 0.5313 0.497 0.497
Call:
vars::VAR(y = demo_df.ts, p = 6, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + date.l6 + daily_vaccinations_per_million.l6 + value.l6 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            7.744e-01  1.067e-01   7.257 1.28e-10 ***
daily_vaccinations_per_million.l1 -9.409e-06  6.125e-06  -1.536  0.12795    
value.l1                          -3.182e+00  8.333e+00  -0.382  0.70344    
date.l2                            2.109e-01  1.352e-01   1.559  0.12243    
daily_vaccinations_per_million.l2  6.106e-06  7.888e-06   0.774  0.44089    
value.l2                           1.004e+00  7.945e+00   0.126  0.89969    
date.l3                           -1.116e-02  1.339e-01  -0.083  0.93376    
daily_vaccinations_per_million.l3  1.974e-05  7.768e-06   2.542  0.01272 *  
value.l3                          -2.905e+00  7.871e+00  -0.369  0.71296    
date.l4                           -7.672e-02  1.313e-01  -0.584  0.56049    
daily_vaccinations_per_million.l4 -2.096e-05  7.377e-06  -2.841  0.00554 ** 
value.l4                           1.098e+01  7.882e+00   1.393  0.16699    
date.l5                           -9.411e-02  1.350e-01  -0.697  0.48759    
daily_vaccinations_per_million.l5  1.267e-06  7.433e-06   0.170  0.86501    
value.l5                          -3.205e+00  8.133e+00  -0.394  0.69447    
date.l6                            8.972e-02  1.083e-01   0.829  0.40936    
daily_vaccinations_per_million.l6  7.508e-06  5.513e-06   1.362  0.17655    
value.l6                           5.956e+00  8.429e+00   0.707  0.48166    
const                              1.989e+03  9.640e+02   2.063  0.04193 *  
trend                              8.111e-01  3.839e-01   2.113  0.03736 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.706 on 91 degrees of freedom
Multiple R-Squared:     1,  Adjusted R-squared: 0.9999 
F-statistic: 1.135e+05 on 19 and 91 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + date.l6 + daily_vaccinations_per_million.l6 + value.l6 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            6.278e+02  1.822e+03   0.345  0.73123    
daily_vaccinations_per_million.l1  8.405e-01  1.046e-01   8.037 3.21e-12 ***
value.l1                           4.909e+04  1.423e+05   0.345  0.73087    
date.l2                            1.029e+03  2.309e+03   0.446  0.65682    
daily_vaccinations_per_million.l2  1.742e-01  1.347e-01   1.293  0.19919    
value.l2                           7.426e+04  1.357e+05   0.547  0.58546    
date.l3                           -1.618e+03  2.286e+03  -0.708  0.48096    
daily_vaccinations_per_million.l3 -1.780e-01  1.326e-01  -1.342  0.18292    
value.l3                           1.953e+05  1.344e+05   1.453  0.14972    
date.l4                            3.658e+02  2.242e+03   0.163  0.87077    
daily_vaccinations_per_million.l4  1.818e-01  1.260e-01   1.443  0.15236    
value.l4                          -1.597e+05  1.346e+05  -1.187  0.23843    
date.l5                            1.114e+03  2.306e+03   0.483  0.63005    
daily_vaccinations_per_million.l5  4.197e-02  1.269e-01   0.331  0.74161    
value.l5                           2.157e+05  1.389e+05   1.554  0.12377    
date.l6                           -1.793e+03  1.848e+03  -0.970  0.33467    
daily_vaccinations_per_million.l6 -2.543e-01  9.413e-02  -2.701  0.00824 ** 
value.l6                          -4.281e+04  1.439e+05  -0.297  0.76681    
const                              4.815e+06  1.646e+07   0.293  0.77056    
trend                              1.463e+03  6.555e+03   0.223  0.82389    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 29140 on 91 degrees of freedom
Multiple R-Squared: 0.9579, Adjusted R-squared: 0.9491 
F-statistic:   109 on 19 and 91 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + date.l6 + daily_vaccinations_per_million.l6 + value.l6 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)  
date.l1                           -9.094e-04  1.304e-03  -0.697   0.4874  
daily_vaccinations_per_million.l1 -2.550e-08  7.485e-08  -0.341   0.7341  
value.l1                           1.003e-01  1.018e-01   0.985   0.3272  
date.l2                            1.984e-03  1.653e-03   1.201   0.2330  
daily_vaccinations_per_million.l2 -5.861e-08  9.640e-08  -0.608   0.5447  
value.l2                           7.876e-02  9.710e-02   0.811   0.4194  
date.l3                            2.452e-04  1.636e-03   0.150   0.8812  
daily_vaccinations_per_million.l3  1.404e-07  9.494e-08   1.478   0.1428  
value.l3                           2.450e-01  9.620e-02   2.547   0.0125 *
date.l4                           -1.086e-05  1.605e-03  -0.007   0.9946  
daily_vaccinations_per_million.l4 -1.089e-08  9.016e-08  -0.121   0.9041  
value.l4                           2.046e-01  9.632e-02   2.124   0.0364 *
date.l5                           -2.960e-03  1.650e-03  -1.794   0.0762 .
daily_vaccinations_per_million.l5 -2.073e-07  9.083e-08  -2.282   0.0248 *
value.l5                           2.529e-01  9.939e-02   2.544   0.0126 *
date.l6                            2.679e-03  1.323e-03   2.025   0.0458 *
daily_vaccinations_per_million.l6  1.123e-07  6.737e-08   1.667   0.0990 .
value.l6                          -9.519e-02  1.030e-01  -0.924   0.3579  
const                             -1.890e+01  1.178e+01  -1.604   0.1121  
trend                             -7.687e-03  4.691e-03  -1.639   0.1048  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02085 on 91 degrees of freedom
Multiple R-Squared: 0.4856, Adjusted R-squared: 0.3782 
F-statistic: 4.521 on 19 and 91 DF,  p-value: 4.642e-07 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            2.912e+00                     -1.187e+04
daily_vaccinations_per_million -1.187e+04                      8.490e+08
value                          -2.724e-03                      6.711e+01
                                    value
date                           -0.0027238
daily_vaccinations_per_million 67.1054112
value                           0.0004349

Correlation matrix of residuals:
                                   date daily_vaccinations_per_million    value
date                            1.00000                        -0.2388 -0.07654
daily_vaccinations_per_million -0.23875                         1.0000  0.11043
value                          -0.07654                         0.1104  1.00000

From the perspective of the p-values associated with these models, each demonstrates a moderate number of significant variables. Although a model with ( p = 1 ) lag might appear overly simplistic and possibly inadequate for capturing the full dynamics of the dataset, a model with ( p = 6 ) lags could potentially offer a slight improvement. This enhancement in performance suggests that incorporating more lag terms allows the VAR model to better account for the temporal dependencies among the variables, thus providing a more accurate and insightful analysis.

Code
VARselect(rep_df.ts, lag.max=10, type="both")
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     1      1      1      1 

$criteria
                  1            2            3            4            5
AIC(n) 1.418188e+01 1.424910e+01 1.420589e+01 1.423788e+01 1.420512e+01
HQ(n)  1.433378e+01 1.449213e+01 1.454006e+01 1.466319e+01 1.472157e+01
SC(n)  1.455658e+01 1.484861e+01 1.503022e+01 1.528702e+01 1.547909e+01
FPE(n) 1.442787e+06 1.544077e+06 1.480779e+06 1.532486e+06 1.488478e+06
                  6            7            8            9           10
AIC(n) 1.423938e+01 1.427922e+01 1.435414e+01 1.446980e+01 1.441354e+01
HQ(n)  1.484697e+01 1.497794e+01 1.514401e+01 1.535080e+01 1.538567e+01
SC(n)  1.573816e+01 1.600282e+01 1.630256e+01 1.664303e+01 1.681159e+01
FPE(n) 1.548383e+06 1.622777e+06 1.765421e+06 2.005630e+06 1.924354e+06

According to the selection criterion, we select the model with p=1 and 5(second lowest AIC).

Code
summary(vars::VAR(rep_df.ts, p=1, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 116 
Log Likelihood: -1315.974 
Roots of the characteristic polynomial:
0.9166 0.8849 0.08589
Call:
vars::VAR(y = rep_df.ts, p = 1, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            8.730e-01  5.020e-02  17.389   <2e-16 ***
daily_vaccinations_per_million.l1 -3.159e-07  1.863e-06  -0.170   0.8656    
value.l1                           1.101e+01  7.954e+00   1.385   0.1689    
const                              2.369e+03  9.338e+02   2.537   0.0126 *  
trend                              9.399e-01  3.699e-01   2.541   0.0124 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.845 on 111 degrees of freedom
Multiple R-Squared: 0.9999, Adjusted R-squared: 0.9999 
F-statistic: 5.251e+05 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            1.362e+03  9.238e+02   1.475    0.143    
daily_vaccinations_per_million.l1  9.047e-01  3.428e-02  26.393   <2e-16 ***
value.l1                          -3.402e+05  1.464e+05  -2.324    0.022 *  
const                             -2.529e+07  1.718e+07  -1.472    0.144    
trend                             -1.038e+04  6.807e+03  -1.524    0.130    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 33940 on 111 degrees of freedom
Multiple R-Squared: 0.9323, Adjusted R-squared: 0.9299 
F-statistic: 382.4 on 4 and 111 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            3.171e-03  5.818e-04   5.450 3.08e-07 ***
daily_vaccinations_per_million.l1  4.565e-08  2.159e-08   2.115   0.0367 *  
value.l1                           1.097e-01  9.218e-02   1.190   0.2367    
const                             -5.893e+01  1.082e+01  -5.446 3.13e-07 ***
trend                             -2.316e-02  4.287e-03  -5.402 3.80e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02138 on 111 degrees of freedom
Multiple R-Squared: 0.3924, Adjusted R-squared: 0.3705 
F-statistic: 17.92 on 4 and 111 DF,  p-value: 2.238e-11 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            3.402e+00                     -1.638e+04
daily_vaccinations_per_million -1.638e+04                      1.152e+09
value                           3.770e-03                     -8.916e+01
                                    value
date                            3.770e-03
daily_vaccinations_per_million -8.916e+01
value                           4.569e-04

Correlation matrix of residuals:
                                   date daily_vaccinations_per_million    value
date                            1.00000                        -0.2617  0.09562
daily_vaccinations_per_million -0.26165                         1.0000 -0.12289
value                           0.09562                        -0.1229  1.00000
Code
summary(vars::VAR(rep_df.ts, p=5, type='both'))

VAR Estimation Results:
========================= 
Endogenous variables: date, daily_vaccinations_per_million, value 
Deterministic variables: both 
Sample size: 112 
Log Likelihood: -1219.025 
Roots of the characteristic polynomial:
0.9039 0.8932 0.8932 0.7899 0.7899 0.775 0.7678 0.7678 0.6684 0.6684 0.5979 0.5796 0.5796 0.4668 0.4668
Call:
vars::VAR(y = rep_df.ts, p = 5, type = "both")


Estimation results for equation date: 
===================================== 
date = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            8.097e-01  1.058e-01   7.653 1.61e-11 ***
daily_vaccinations_per_million.l1 -6.276e-06  5.840e-06  -1.075   0.2853    
value.l1                           6.927e+00  8.050e+00   0.860   0.3917    
date.l2                            2.009e-01  1.326e-01   1.515   0.1330    
daily_vaccinations_per_million.l2  8.339e-06  7.906e-06   1.055   0.2942    
value.l2                           8.777e+00  8.204e+00   1.070   0.2874    
date.l3                           -9.356e-02  1.342e-01  -0.697   0.4873    
daily_vaccinations_per_million.l3  1.013e-05  7.516e-06   1.347   0.1810    
value.l3                           2.046e+00  8.248e+00   0.248   0.8046    
date.l4                           -3.519e-02  1.313e-01  -0.268   0.7892    
daily_vaccinations_per_million.l4 -1.772e-05  7.104e-06  -2.495   0.0143 *  
value.l4                          -1.658e+01  8.278e+00  -2.003   0.0480 *  
date.l5                            8.711e-03  1.042e-01   0.084   0.9335    
daily_vaccinations_per_million.l5  9.113e-06  5.232e-06   1.742   0.0848 .  
value.l5                           3.697e+00  8.121e+00   0.455   0.6499    
const                              2.041e+03  1.326e+03   1.540   0.1270    
trend                              8.247e-01  5.242e-01   1.573   0.1190    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 1.67 on 95 degrees of freedom
Multiple R-Squared:     1,  Adjusted R-squared:     1 
F-statistic: 1.445e+05 on 16 and 95 DF,  p-value: < 2.2e-16 


Estimation results for equation daily_vaccinations_per_million: 
=============================================================== 
daily_vaccinations_per_million = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)    
date.l1                            1.871e+02  1.872e+03   0.100   0.9206    
daily_vaccinations_per_million.l1  9.325e-01  1.033e-01   9.025 2.01e-14 ***
value.l1                          -3.737e+05  1.424e+05  -2.623   0.0101 *  
date.l2                            3.565e+03  2.346e+03   1.520   0.1319    
daily_vaccinations_per_million.l2  2.095e-01  1.399e-01   1.498   0.1376    
value.l2                           2.966e+05  1.451e+05   2.043   0.0438 *  
date.l3                           -2.426e+03  2.374e+03  -1.022   0.3095    
daily_vaccinations_per_million.l3 -1.810e-01  1.330e-01  -1.361   0.1768    
value.l3                          -3.014e+04  1.459e+05  -0.207   0.8368    
date.l4                           -2.803e+03  2.322e+03  -1.207   0.2304    
daily_vaccinations_per_million.l4  1.316e-01  1.257e-01   1.047   0.2977    
value.l4                           3.679e+04  1.465e+05   0.251   0.8022    
date.l5                            2.090e+03  1.843e+03   1.134   0.2597    
daily_vaccinations_per_million.l5 -2.150e-01  9.256e-02  -2.323   0.0223 *  
value.l5                          -1.888e+04  1.437e+05  -0.131   0.8957    
const                             -1.137e+07  2.345e+07  -0.485   0.6289    
trend                             -4.856e+03  9.274e+03  -0.524   0.6018    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 29550 on 95 degrees of freedom
Multiple R-Squared: 0.955,  Adjusted R-squared: 0.9474 
F-statistic:   126 on 16 and 95 DF,  p-value: < 2.2e-16 


Estimation results for equation value: 
====================================== 
value = date.l1 + daily_vaccinations_per_million.l1 + value.l1 + date.l2 + daily_vaccinations_per_million.l2 + value.l2 + date.l3 + daily_vaccinations_per_million.l3 + value.l3 + date.l4 + daily_vaccinations_per_million.l4 + value.l4 + date.l5 + daily_vaccinations_per_million.l5 + value.l5 + const + trend 

                                    Estimate Std. Error t value Pr(>|t|)   
date.l1                            4.080e-03  1.295e-03   3.151  0.00218 **
daily_vaccinations_per_million.l1  1.271e-07  7.148e-08   1.778  0.07858 . 
value.l1                           1.007e-01  9.853e-02   1.022  0.30922   
date.l2                            8.453e-04  1.623e-03   0.521  0.60364   
daily_vaccinations_per_million.l2 -1.079e-07  9.676e-08  -1.115  0.26778   
value.l2                          -1.087e-05  1.004e-01   0.000  0.99991   
date.l3                           -3.989e-03  1.642e-03  -2.429  0.01700 * 
daily_vaccinations_per_million.l3  4.500e-09  9.199e-08   0.049  0.96109   
value.l3                          -1.007e-01  1.009e-01  -0.998  0.32103   
date.l4                            2.680e-03  1.607e-03   1.668  0.09858 . 
daily_vaccinations_per_million.l4 -3.637e-08  8.695e-08  -0.418  0.67668   
value.l4                           1.543e-01  1.013e-01   1.523  0.13113   
date.l5                           -1.523e-03  1.275e-03  -1.194  0.23531   
daily_vaccinations_per_million.l5  9.103e-08  6.403e-08   1.422  0.15841   
value.l5                           1.812e-01  9.939e-02   1.823  0.07141 . 
const                             -3.895e+01  1.622e+01  -2.401  0.01832 * 
trend                             -1.513e-02  6.415e-03  -2.359  0.02038 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02044 on 95 degrees of freedom
Multiple R-Squared: 0.5238, Adjusted R-squared: 0.4436 
F-statistic:  6.53 on 16 and 95 DF,  p-value: 1.168e-09 



Covariance matrix of residuals:
                                     date daily_vaccinations_per_million
date                            2.790e+00                     -1.391e+04
daily_vaccinations_per_million -1.391e+04                      8.734e+08
value                          -8.582e-05                     -2.959e+01
                                    value
date                           -8.582e-05
daily_vaccinations_per_million -2.959e+01
value                           4.180e-04

Correlation matrix of residuals:
                                    date daily_vaccinations_per_million
date                            1.000000                       -0.28181
daily_vaccinations_per_million -0.281806                        1.00000
value                          -0.002513                       -0.04898
                                   value
date                           -0.002513
daily_vaccinations_per_million -0.048975
value                           1.000000

Examining the VAR models with ( p = 1 ) and ( p = 5 ) lag orders, we notice the presence of some variables that do not significantly contribute to the model. Notably, the model with ( p = 5 ) lag periods tends to have fewer significant variables compared to the ( p = 1 ) model. However, opting for ( p = 1 ) may overly simplify the model, potentially failing to capture all the relevant and significant relationships among the variables. Thus, while ( p = 1 ) offers a more parsimonious model, it might not sufficiently account for the complexity of the dynamics within the data, suggesting that a balance must be struck between model simplicity and its ability to elucidate significant interactions.

4.3 Cross Validation

To validate our hypothesis and ensure the robustness of our model selection, we employ cross-validation to compare the Root Mean Squared Error (RMSE) of the two chosen models. By assessing the RMSE, we can quantitatively determine which model provides a more accurate fit to the data.

Code
#n=length(inde_df.ts) 351
#n-k=91; 260/52=5;
n=length(inde_df.ts)
k=91

rmse1 <- matrix(NA, 52,5)
rmse2 <- matrix(NA, 52,5)
week<-c()

# Convert data frame to time series object
st <- tsp(inde_df.ts )[1]+(k-1)/52


for(i in 1:1)
{
  
  xtrain <- window(inde_df.ts, end=st + i-1)
  xtest <- window(inde_df.ts, start=st + (i-1) + 1/52, end=st + i)
  
  ######## first Model ############
  fit <- VAR(inde_df.ts, p=3, type='both')
  fcast <- predict(fit, n.ahead = 52)
  
  fvacc<-fcast$fcst$daily_vaccinations_per_million
  fsupp<-fcast$fcst$value

  ff<-data.frame(fvacc[,1],fsupp[,1]) #collecting the forecasts for 3 variables
  
  week<-st + (i-1) + 1/52 
  
  ff<-ts(ff,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i 
  
  ##### collecting errors ######
  rmse1[c(a:b),]  <-sqrt((ff-xtest[,2:3])^2)
  
  ######## Second Model ############
  fit2 <- vars::VAR(inde_df.ts, p=1, type='both')
  fcast2 <- predict(fit2, n.ahead = 52)
  
  fvacc<-fcast2$fcst$daily_vaccinations_per_million
  fsupp<-fcast2$fcst$value

  ff2<-data.frame(fvacc[,1],fsupp[,1])
  
  week<-st + (i-1) + 1/52
  
  ff2<-ts(ff2,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i
  rmse2[c(a:b),]  <-sqrt((ff2-xtest[,2:3])^2)
}
Code
plot(1:52, rowMeans(rmse1,na.rm=TRUE), type="l", col=2, xlab="Week", ylab="RMSE")
lines(1:52, rowMeans(rmse2,na.rm=TRUE), type="l",col=3)
legend("topleft",legend=c("VAR 3","VAR 1"),col=2:4,lty=1)

VAR(1) appears to have lower RMSE than VAR(3), therefore VAR(1) is a better fit.

Code
#n=length(inde_df.ts) 351
#n-k=91; 260/52=5;
n=length(demo_df.ts)
k=91

rmse1 <- matrix(NA, 52,5)
rmse2 <- matrix(NA, 52,5)
week<-c()

# Convert data frame to time series object
st <- tsp(demo_df.ts )[1]+(k-1)/52


for(i in 1:1)
{
  
  xtrain <- window(demo_df.ts, end=st + i-1)
  xtest <- window(demo_df.ts, start=st + (i-1) + 1/52, end=st + i)
  
  ######## first Model ############
  fit <- VAR(demo_df.ts, p=6, type='both')
  fcast <- predict(fit, n.ahead = 52)
  
  fvacc<-fcast$fcst$daily_vaccinations_per_million
  fsupp<-fcast$fcst$value

  ff<-data.frame(fvacc[,1],fsupp[,1]) #collecting the forecasts for 3 variables
  
  week<-st + (i-1) + 1/52 
  
  ff<-ts(ff,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i 
  
  ##### collecting errors ######
  rmse1[c(a:b),]  <-sqrt((ff-xtest[,2:3])^2)
  
  ######## Second Model ############
  fit2 <- vars::VAR(demo_df.ts, p=1, type='both')
  fcast2 <- predict(fit2, n.ahead = 52)
  
  fvacc<-fcast2$fcst$daily_vaccinations_per_million
  fsupp<-fcast2$fcst$value

  ff2<-data.frame(fvacc[,1],fsupp[,1])
  
  week<-st + (i-1) + 1/52
  
  ff2<-ts(ff2,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i
  rmse2[c(a:b),]  <-sqrt((ff2-xtest[,2:3])^2)
}
Code
plot(1:52, rowMeans(rmse1,na.rm=TRUE), type="l", col=2, xlab="Week", ylab="RMSE")
lines(1:52, rowMeans(rmse2,na.rm=TRUE), type="l",col=3)
legend("topleft",legend=c("VAR 6","VAR 1"),col=2:4,lty=1)

VAR(6) appears to have lower RMSE than VAR(1), therefore VAR(6) is a better fit.

Code
#n=length(inde_df.ts) 351
#n-k=91; 260/52=5;
n=length(rep_df.ts)
k=91

rmse1 <- matrix(NA, 52,5)
rmse2 <- matrix(NA, 52,5)
week<-c()

# Convert data frame to time series object
st <- tsp(rep_df.ts )[1]+(k-1)/52


for(i in 1:1)
{
  
  xtrain <- window(rep_df.ts, end=st + i-1)
  xtest <- window(rep_df.ts, start=st + (i-1) + 1/52, end=st + i)
  
  ######## first Model ############
  fit <- VAR(rep_df.ts, p=5, type='both')
  fcast <- predict(fit, n.ahead = 52)
  
  fvacc<-fcast$fcst$daily_vaccinations_per_million
  fsupp<-fcast$fcst$value

  ff<-data.frame(fvacc[,1],fsupp[,1]) #collecting the forecasts for 3 variables
  
  week<-st + (i-1) + 1/52 
  
  ff<-ts(ff,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i 
  
  ##### collecting errors ######
  rmse1[c(a:b),]  <-sqrt((ff-xtest[,2:3])^2)
  
  ######## Second Model ############
  fit2 <- vars::VAR(rep_df.ts, p=1, type='both')
  fcast2 <- predict(fit2, n.ahead = 52)
  
  fvacc<-fcast2$fcst$daily_vaccinations_per_million
  fsupp<-fcast2$fcst$value

  ff2<-data.frame(fvacc[,1],fsupp[,1])
  
  week<-st + (i-1) + 1/52
  
  ff2<-ts(ff2,start=c(week,1),frequency = 52)
  
  a = 52*i-51 
  b = 52*i
  rmse2[c(a:b),]  <-sqrt((ff2-xtest[,2:3])^2)
}
Code
plot(1:52, rowMeans(rmse1,na.rm=TRUE), type="l", col=2, xlab="Week", ylab="RMSE")
lines(1:52, rowMeans(rmse2,na.rm=TRUE), type="l",col=3)
legend("topleft",legend=c("VAR 5","VAR 1"),col=2:4,lty=1)

VAR(5) appears to have lower RMSE than VAR(1), therefore VAR(5) is a better fit.

4.4 Forecast

In this section, we used the model we selected from the results of cross validation to forecast the future trend.

The best model for this relationship is VAR(1).

Code
var1 <- VAR(inde_df.ts, p=1, type="const")
forecast(var1) %>%
  autoplot() + xlab("Week")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA))

The forecast results indicate a projected decline in daily vaccinations per million in the upcoming period, which aligns with our current understanding of the trends. Additionally, the support rate for the independent party is expected to decrease overall. These trends suggest significant shifts in public health dynamics and political landscapes, which warrant close monitoring and analysis to better understand the underlying factors driving these changes.

The best model for this relationship is VAR(6).

Code
var1 <- VAR(demo_df.ts, p=6, type="const")
forecast(var1) %>%
  autoplot() + xlab("Week")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA))

The forecast results indicate a projected decline in daily vaccinations per million in the upcoming period, which aligns with our current understanding of the trends. Additionally, the support rate for the democratic party is expected to be stable with some fluctuations overall. These trends suggest significant shifts in public health dynamics and political landscapes, which warrant close monitoring and analysis to better understand the underlying factors driving these changes.

The best model for this relationship is VAR(5).

Code
var1 <- VAR(rep_df.ts, p=5, type="const")
forecast(var1) %>%
  autoplot() + xlab("Week")+ theme_bw()+
      theme(plot.background = element_rect(fill = "#D9E3F1", color = NA),
            panel.background = element_rect(fill = "#D9E3F1", color = NA))

The forecast results indicate a projected decline in daily vaccinations per million in the upcoming period, which aligns with our current understanding of the trends. Additionally, the support rate for the republican party is expected to be stable with some fluctuations overall. These trends suggest significant shifts in public health dynamics and political landscapes, which warrant close monitoring and analysis to better understand the underlying factors driving these changes.